0%

(CVPR 2015) Deep neural networks are easily fooled:High confidence predictions for unrecognizable images

Posted on 2018-05-11 In Paper Note , Adversarial Attack Views:

Nguyen A, Yosinski J, Clune J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 427-436.

1. Overview

1.1. Motivation

A recent study revealed that changing an image in a way imperceptible to humans can cause a DNN to label the image as something else entirely.

In this paper

show that easy to produce images that are urecognizable to humans, but DNN 99.99% believe it is a recognizable obj
Evolution Algorithm
directly (row1 ) and indirectly (row 2)

Gradient Ascent

1.2. Procedure

1.3. Dataset

MNIST
ImageNet

1.4. Network

AlexNet
LeNet
CaffeNet

1.5. Discussion

the are a discriminative model allocates to a class may be much larger than the area occupied by training examples for that class
Application. security camera (face, voice), search engine rankings (image’s background), driverless car (generate fooling images)

2. Evolution

2.1. Directly Encoding

unrecognizable to human.
- uniform random initialize each pixel within [0, 255]
- each pixel has 10% chosen to be mutated, rate decay every 1000 generation
- polynomial mutation operator with a fixed mutation strength of 15 to mutate chosen pixel

2.2. Indirectly Encoding

recognizable to human.
- producing image contain compressible patterns (symmetry and repetition)
- based on Compositional Pattern-Producing Network (CPPN). take pixel as input, and output a new pixel

2.3. Gradient Ascent

maximize the softmax output for classes via gradient ascent to find image
employ L2-regularization to produce images with some recognizable features of classes (dog face, fox ears)

3. Experiments

3.1. Directly Encoding on ImageNet

Less successful at producing high-confidence images on large dataset compare with MNIST. (larger dataset→ less overfit→ more difficult to fool)

3.2. Indirectly Encoding on ImageNet

More successful.
Similar images for closely related categories.

3.3. Generalization

3.3.1. Same Architecture & Different Initialization

many images fooling A also fool B
still some images different

3.3.2. Different Architecture

many images generalize across DNN architecture

3.4. Train Network to Recognize Fooling Images

(MNIST) similar performance to without fooling images
(ImageNet) on the contrary